Distillation of human–object interaction contexts for action recognition
نویسندگان
چکیده
Modeling spatial-temporal relations is imperative for recognizing human actions, especially when a interacting with objects, while multiple objects appear around the differently over time. Most existing action recognition models focus on learning overall visual cues of scene but disregard holistic view human–object relationships and interactions, that is, how interacts respect to short-term task completion long-term goal. We therefore argue improve by exploiting both local global contexts interactions (HOIs). In this paper, we propose Global-Local Interaction Distillation Network (GLIDN), object through space time via knowledge distillation HOI understanding. GLIDN encodes humans into graph nodes learns attention network. The context graphs learn relation between at frame level capturing their co-occurrence specific step. constructed based video-level identifying throughout video sequence. also investigate from these can be distilled counterparts improving recognition. Finally, evaluate our model conducting comprehensive experiments two datasets including Charades CAD-120. Our method outperforms baselines counterpart approaches.
منابع مشابه
Manipulative Action Recognition for Human-Robot Interaction
Recently, human-robot interaction is receiving more and more interest in the robotics as well as in the computer vision research community. From the robotics perspective, robots that cooperate with humans are an interesting application field that is expected to have a high future market potential. A couple of global and also mid-sized companies have come up with quite sophisticated robotic plat...
متن کاملContexts for Human Action
We argue that the mathematics developed for the semantics of computer languages can be fruitfully applied to problems in human communication and action.
متن کاملInformative joints based human action recognition using skeleton contexts
The launching of Microsoft Kinect with skeleton tracking technique opens up new potentials for skeleton based human action recognition. However, the 3D human skeletons, generated via skeleton tracking from the depth map sequences, are generally very noisy and unreliable. In this paper, we introduce a robust informative joints based human action recognition method. Inspired by the instinct of th...
متن کاملDeep Alternative Neural Network: Exploring Contexts as Early as Possible for Action Recognition
Contexts are crucial for action recognition in video. Current methods often mine contexts after extracting hierarchical local features and focus on their high-order encodings. This paper instead explores contexts as early as possible and leverages their evolutions for action recognition. In particular, we introduce a novel architecture called deep alternative neural network (DANN) stacking alte...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computer Animation and Virtual Worlds
سال: 2022
ISSN: ['1546-427X', '1546-4261']
DOI: https://doi.org/10.1002/cav.2107